ggplot2: Basics

Bild von Allison Horst.

ggplot2: Das Paket


ggplot2 gehört zum tidyverse

#install.packages("tidyverse")
library(tidyverse)


… kann aber natürlich auch seperat geladen werden:

#install.packages("ggplot2")
library(ggplot2)

Erste Schritte.

Foto von Omar Lopez auf Unsplash

The Big picture

Start: ggplot()

Komponenten

  1. Daten.
  2. Aesthetic mapping zwischen Daten und visuellen Eigenschaften.
  3. [Layer(s)] zum rendern der Daten.

Daten vorstellen

library(jsonlite)
library(tidyverse)

download.file(url = "https://github.com/open-numbers/ddf--gapminder--fasttrack/archive/refs/heads/master.zip"
                                   , destfile = "gapminder_fasttrack_master.zip")

unzip(zipfile = "gapminder_fasttrack_master.zip", exdir = "data")

gapminder_path <- "data/ddf--gapminder--fasttrack-master/"

json_data <- jsonlite::fromJSON(here::here(gapminder_path, "datapackage.json"))

if (file.exists("gapminder_fasttrack_master.zip")) {
  #Delete file if it exists
  file.remove("gapminder_fasttrack_master.zip")
}

wanted_keywords <- c("corrupt", "justice", "progress")

# Get paths and names
csv_paths <- json_data$resources$path
csv_names <- json_data$resources$name

# Find matching files by fuzzy keyword match in the name
matched_indices <- str_detect(csv_names, str_c(wanted_keywords, collapse = "|"))
matched_paths <- paste0(gapminder_path, csv_paths[matched_indices])
matched_names <- csv_names[matched_indices]

# Early exit if nothing matches
if (length(matched_paths) == 0) {
  stop("No files matched the specified keywords.")
}

# 🧩 Initialize with the first matching file
merged_df <- read_csv(matched_paths[1])

# Loop through and merge the rest
if (length(matched_paths) > 1) {
  for (i in 2:length(matched_paths)) {
    message("Reading file: ", matched_paths[i])
    temp_df <- read_csv(matched_paths[i])
    
    merged_df <- full_join(merged_df, temp_df)
    rm(temp_df)
    gc()
  }
}


# Create timestamp string: e.g., "2025-04-08_14-30-15"
timestamp <- format(Sys.time(), "%Y-%m-%d_%H-%M-%S")

# Build filename with path
filename <- paste0("./data/gapminder_set_", timestamp, ".RDS")

# Save RDS
saveRDS(merged_df, filename)

if (dir.exists(gapminder_path)) {
  unlink(gapminder_path, recursive = TRUE)
}
pop_world <- read.csv(here::here("raw_data", "pop.csv"))
co2_world <- read.csv(here::here("raw_data", "co2_pcap_cons.csv"))

colnames(co2_world) <- gsub("^X", "", colnames(co2_world)) 
co2_world[, 2:ncol(co2_world)] <- co2_world[, 2:ncol(co2_world)] %>% 
  mutate(across(everything(), ~ gsub("−", "-", as.character(.)))) %>% 
mutate_if(is.character, as.numeric) 

co2_world <- co2_world %>% 
  pivot_longer(cols = -country, 
               names_to = "year", 
               values_to = "co2")

Daten

ggplot(data = movies_metadat)

Aesthetic mapping

Um diese leere Leinwand zu befüllen, müssen wir die Daten mit den benötigten visuellen Eigenschaften verknüpfen:

mapping = aes()

Je nach Plot-Art sind verschiedene visuelle Eigenschaften möglich. Wichtig ist für uns jetzt erst einmal die Position, also x - und y-Achsen.
Es kann hier aber z.B. auch die Farbe der Punkte in Agnhängikeit von Kategorien in den Daten geändert werden.

Aesthetic mapping: Achsen

ggplot(data = movies_metadat, 
       mapping = aes(x = budget, 
                     y = vote_average))

Geometric Layers

ggplots sind aus verschiedenen Layern aufgebaut, die mithilfe eines + übereinander gelegt werden.

geom_

Layers

ggplot(data = movies_metadat, 
       mapping = aes(x = budget, 
                     y = vote_average)) +
  geom_point()

Mehr Layers!

ggplot(data = movies_metadat, 
       mapping = aes(x = budget, 
                     y = vote_average)) +
  geom_point() +
  geom_smooth()

Titel/Labels

ggplot(data = movies_metadat, 
       mapping = aes(x = budget, 
                     y = vote_average)) +
  geom_point() +
  geom_smooth() +
  labs(
    title = "Getting a bang for your buck: Are Movies with higher budget also better?",
    subtitle = "There doesn't seem to be a strong relation between movie budget and average rating.",
    x = "Movie budget",
    y = "Average vote"
  )

Style deinen Plot: Themes

ggplot(data = movies_metadat, 
       mapping = aes(x = budget, 
                     y = vote_average)) +
  geom_point() +
  geom_smooth() +
  labs(
    title = "Getting a bang for your buck: Are Movies with higher budget also better?",
    subtitle = "There doesn't seem to be a strong relation between movie budget and average rating.",
    x = "Movie budget",
    y = "Average vote"
  ) +
  theme_classic()

Übung

Let’s take a deeper dive

Hier dann nochmal genauer durchgehen - Was haben wir eigentlich gemacht. Nicht zu sehr in den Basics verlieren, auch schneller tiefer reingehen (scales, coord system …)

Abspeichern

Farben

https://questionsindataviz.com/2023/12/29/what-makes-a-truly-terrible-map/